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Abstract 

Csiszar's /-divergence of two probability distributions was extended to the 
quantum case by the author in 1985. In the quantum setting positive semidehnite 
matrices are in the place of probability distributions and the quantum generaliza- 
tion is called quasi-entropy which is related to some other important concepts as 
covariance, quadratic costs, Fisher information, Cramer-Rao inequality and uncer- 
tainty relation. A conjecture about the scalar curvature of a Fisher information 
geometry is explained. The described subjects are overviewed in details in the 
matrix setting, but at the very end the von Neumann algebra approach is sketched 
shortly. 
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Let A" be a finite space with probability measures p and q. Their relative entropy 
or divergence 

D{p\\q) = ^p{x) log^y~Y 

was introduced by Kullback and Leibler in 1951 [27j. More precisely, if p{x) = q{x) = 
0, then \og{p{x) / q{x)) = and if p{x) ^ but q{x) = for some x E X, then 
\og{p{x) / q{x)) = +00. 

A possible generalization of the relative entropy is the /-divergence introduced by 
Csiszar: 

DfipM = J2^ix)f{^) (1) 
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with a real function f{x) defined for x > P,[H]. For the convex function f{x) = a; log a; 
the relative entropy is obtained. 

This paper first gives a rather short survey about /-divergence and we turn to the 
non-commutative (algebraic, or quantum) generalization. Roughly speaking this means 
that the positive n-tuples p and q are replaced by positive semidefinite n x n matrices 
and the main questions in the study remain rather similar to the probabilistic case. The 
quantum generalization was called quasi-entropy and it is related to some other impor- 
tant concepts as covariance, quadratic costs, Fisher information, Cramer-Rao inequality 
and uncertainty relation. These subjects are overviewed in details in the matrix setting, 
but at the very end the von Neumann algebra approach is sketched shortly. When the 
details are not presented, the precise references are given. 

1 /-divergence and its use 

Let J-" be the set of continuous convex functions M. The following result explains 

the importance of convexity. 

Let ^ be a partition of X. If p is a probability distribution on X, then : = 

SxgaP(^) becomes a probability distribution on A 

Theorem 1 Let A be a partition of X and p,q be probability distributions on X . If 
f & T, then 

Df{pA\\qA)<Df{p\\q). 

The inequality in the theorem is the monotonicity of the /-divergence. A particular 
case is 

f{l)<Df{p\\q). 
Theorem 2 Let f,g & T and assume that 

Dfip\\q)=DMQ). 

for every distribution p and q. Then there exists a constant c G M such that f{x)—g{x) = 
c{x — 1). 

Since the divergence is a kind of informational distance, we want Df{p\\p) = and 
require /(I) = 0. This is nothing else but a normalization, 

Df+Mq) = Dfip\\q) + c. 

A bit more generally, we can say that if f{x) — g{x) is a linear function, then Df and 
Dg are essentially the same quantities. 
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It is interesting to remark that qf{p/q) can be considered also as a mean of p and q. 
In that case the mean of p and p should be p, so in the theory of means /(I) = 1 is a 
different natural requirement. 

Set f*{x) = xf{x~^). Then Df{p\\q) = Df*{q\\p). The equality /* = / is the 
symmetry condition. 

Example 1 Let f{x) = |x — 1|. Then 

Df{p,q) = \p{x) - q{x) \ =: V{p,q) 

X 

is the variational distance of p and q. □ 
Example 2 Let f{x) = {1 - y/xf /2. Then 

Df{p, q) = ^iVRx) - \/q{x)f =: H^{p, q) 

X 

is the squared Hellinger distance of p and q. □ 
Example 3 The function 

fait)= ^ A i-n 

a[l — a) 

gives the relative a-entropy 

^ ' X 

The limit a — > gives the relative entropy. □ 
Several other functions appeared in the literature, we list a few of them: 

f^'\x) = ^^-^-—^{l+x-x'-x^-') 0<s^l [9], (3) 

f -rm ((1 + ^^f" - 2'/^"'(l + x)) if < /5 ^ 1, 

fp{x)=\ ' m (4) 

y (1 + x) log2 + a; logo; — (1 + a;) log(x + 1) if /3 = 1. 

The following result of Csiszar is a characterization (or axiomatization) of the /- 
divergence. 

Theorem 3 Assume that a number C{p,q) G M «s associated to probability distributions 
on the same set X for all finite sets X. If 

(a) C{p,q) is invariant under the permutations of the basic set X. 



(b) if A is a partition of X , then C{p_aAa) < C{p,q) and the equality holds if and 
only if 

PA{A)q{x) = q_^{A)p{x) 

whenever x ^ A ^ A, 

then there exists a convex function f : M+ M. which is continuous at and C {p, q) = 
Df{p\\q) for every p and q. 

2 Quantum quasi-entropy 

In the mathematical formahsm of quantum mechanics, instead of n-tuples of numbers 
one works with n x n complex matrices. They form an algebra and this allows an 
algebraic approach. In this approach, a probability density is replaced by a positive 
semidefinite matrix of trace 1 which is called density matrix [39]. The eigenvalues of 
a density matrix give a probability density. However, this is not the only probability 
density provided by a density matrix. If we rewrite the matrix in a certain orthonormal 
basis, then the diagonal element pi,p2, . . . ,Pn form a probability density. 

Let Ai denote the algebra of n x n matrices with complex entries. For positive definite 
matrices pi,p2 ^ -M, for A & Ai and a function / : R"*" R, the quasi-entropy is 
defined as 

SfiPiM ■■= (Apf ,/(A(pi/p2))(Apf)) 

= TrpfA7(A(pi/p2))(Apf), (5) 

where {B, C) := Tr B*C is the so-called Hilbert-Schmidt inner product and A(pi/ P2) : 
}A ^ M. is a, linear mapping acting on matrices: 

A(pi/P2)v4 = PiAp2 ^ 

This concept was introduced in [321 El] , see also Chapter 7 in ^I] and it is the quan- 
tum generalization of the /-entropy of Csiszar used in classical information theory (and 
statistics) [3 [30]. 

The monotonicity in Theorem [T] is the consequence of the Jensen inequality. A func- 
tion / : R"*" ^ R is called matrix concave if one of the following two equivalent 
conditions holds: 

f{\A + (1 - \)B) > XfiA) + (1 - A)/(B) (6) 

for every number < A < 1 and for positive definite square matrices A and B (of the 
same size). In the other condition the number A is (heuristically) replaced by a matrix: 

f{CAC* + DBD*)>CfiA)C* + Df{B)D* (7) 

if CC* + DD* = I. 



4 



A function / : — > M is called matrix monotone if for positive definite matrices 
A < B the inequality f{A) < f{B) holds. It is interesting that a matrix monotone 
function is matrix concave and a matrix concave function is matrix monotone if it is 
bounded from below [T7] . 

Let a : A^o ^ be a mapping between two matrix algebras. The dual a* : Ai ^ 
A^o with respect to the Hilbert-Schmidt inner product is positive if and only if a is 
positive. Moreover, a is unital if and only if a* is trace preserving, a : M.q ^ is 
called a Schwarz mapping if 

a{B*B) > a{B*)a{B) (8) 

for every B E Mq. 

The quasi-entropies are monotone and jointly convex [311 [3l]. 

Theorem 4 Assume that f : M"*" — >■ R zs an operator monotone function with /(O) > 
and a : Aio ^ Ai is a unital Schwarz mapping. Then 

Sf{a\p{),a*{p,))>Sf^\pup,) (9) 

holds for A G M.o and for invertible density matrices pi and p2 from the matrix algebra 
M. 



Proof: The proof is based on inequalities for operator monotone and operator concave 
functions. First note that 

Sf^,{a*{p,),a*{p2)) = Sf{a*{p,),a*{p2)) + cTTpMA*A)) 

and 

^;|?(Pi,P2) = Sf''\p^, p2) + cTt p,iaiA)MA)) 

for a positive constant c. Due to the Schwarz inequality ([H]), we may assume that 
/(O) = 0. 

Let A := A(pi/p2) and Aq := A{a*{pi)/a*{p2)). The operator 

VXa*{p2 f^ = a{X)pl^^ {X e Mo) (10) 

is a contraction: 

||«(X)pf f = Trp2(«(X)*a(X)) 

< TTp2{a{X*X) = TTa*{p2)X*X = \\Xa*{p2y/^\\'^ 

since the Schwarz inequality is apphcable to a. A similar simple computation gives that 

V*AV<Ao. (11) 
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Since / is operator monotone, we have /(Aq) > f{V*AV). Recall that / is operator 
concave, therefore f{V*AV) > V*f{A)V and we conclude 




(12) 



□ 



It is remarkable that for a multiplicative a we do not need the condition /(O) > 0. 
Moreover, l^*Ay = Aq and we do not need the matrix monotonicity of the function /. 
In this case the only condition is the matrix concavity, analogously to Theorem [H 

If we apply the monotonicity © to the embedding a{X) = X©X of 7V1 into A4(BM. 
and to the densities pi = XEi © (1 — A)Fi, p2 = XE2 © (1 — A)F2, then we obtain the 
joint concavity of the quasi-entropy: 



XSfiEi, E2) + (1 - A)5f (Fi, F2) < Sf{\E^ + (1 - \)E2) + Sf{\F^ + (1 - \)F2) 



holds. The case f{t) = is the famous Lieb's concavity theorem: TiAp'^A *pi-") is 
concave in p [29]. 

The concept of quasi-entropy includes some important special cases. If p2 and pi 
are different and A = I, then we have a kind of relative entropy. For f{x) = xlogx 
we have Umegaki's relative entropy S{pi\\p2) = Trpi(logpi — logp2). (If we want a 
matrix monotone function, then we can take f{x) = log a; and then we get S'(p2||pi).) 
Umegaki's relative entropy is the most important example, therefore the function / will 
be chosen to be matrix convex. This makes the probabilistic and non-commutative 
situation compatible as one can see in the next argument. 

Let pi and p2 be density matrices in A^. If in certain basis they have diagonal 
P = {.P1-P2, ■ ■ ■ ,Pn) and q = (gi, g2, • • • , In), then the monotonicity theorem gives the 
inequality 



for a matrix convex function /. If pi and p2 commute, them we can take the common 
eigenbasis and in (fT3l) the equality appears. It is not trivial that otherwise the inequality 
is strict. 

If pi and P2 are different, then there is a choice for p and q such that they are different 
as well. Then 



Conversely, if S'/(pi||p2) = 0, then p = q for every basis and this implies pi = p2. For the 
relative entropy, a deeper result is known. The Pinsker-Csiszar inequality says that 



Df{p\\q)<Sf{p,\\p2) 



(13) 



0<Df{p\\q)<SMp2). 



{\\p-q\\iY<2Dip\\q). 



(14) 



This extends to the quantum CcLSG cLS 



(||pi-p2||l)'<2S(pi||p2), 



(15) 



see [22], or [SSI Chap. 3]. 
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Problem 1 It would be interesting to extend Theorem [3] of Csiszar to the quantum 
case. If we require monotonicity and specify the condition for equality, then a function / 
is provided by Theorem [SI but for non-commuting densities the conclusion is not clear. 

Example 4 Let 

a[l — a) 

is matrix monotone decreasing for a G (—1,1). (For a = 0, the limit is taken and it is 
— logx.) Then the relative entropies of degree a are produced: 

SaiP2\\pi) ■■= . Tr (/ - p>r)P2. 

a(l — a) 

These quantities are essential in the quantum case. □ 

If P2 = pi = p and A,B E Ai are arbitrary, then one can approach to the generalized 
covariance [38] . 

qCovJ(A,5) := {Ap'/'j{A{p/p)){Bp'/')) - {Tr p A*) {Tr pB). (16) 

is a generalized covariance. If p,A and B commute, then this becomes f{l)TT pA*B — 
(Tr pA*){TT pB). This shows that the normalization /(I) = 1 is natural. The generalized 
covariance qCov^(y4, B) is a sesquilinear form and it is determined by qCov^(A, A) when 
{A G A4 : Tt pA = 0}. Formally, this is a quasi-entropy and Theorem H] applies if / is 
matrix monotone. If we require the symmetry condition qCoVp{A, A) = qCov^(A*, A*), 
then / should have the symmetry xf{x~^) = f{x). 

Assume that Tr pA = Tr pB = and p = Diag (Ai, A2, . . . , A„). Then 

qCov^ (A, 5) = E A./(A,/A.)A^5,,. (17) 

A matrix monotone function / : M"*" M"*" will be called standard if xf{x~^) = f{x) 
and /(I) = 1. A standard function / admits a canonical representation 

where h : [0, 1] [0, 1] is a measurable function [18j. 

The usual symmetrized covariance corresponds to the function f{t) = {t + l)/2: 

Cov p{A, B) := -Ty {p{A*B + BA*)) - (Tr pA*)(Tr pB). 

The interpretation of the covariances is not at all clear. In the next section they will be 
called quadratic cost functions. It turns out that there is a one-to-one correspondence 
between quadratic cost functions and Fisher informations. 
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3 Fisher information 



3.1 The Cramer- Rao inequality 

The Cramer-Rao inequality belongs to the basics of estimation theory in mathematical 
statistics. Its quantum analog was discovered immediately after the foundation of math- 
ematical quantum estimation theory in the 1960 's, see the book [21] of Helstrom, or the 
book [21] of Holevo for a rigorous summary of the subject. Although both the classical 
Cramer-Rao inequality and its quantum analog are as trivial as the Schwarz inequality, 
the subject takes a lot of attention because it is located on the highly exciting boundary 
of statistics, information and quantum theory. 

As a starting point we give a very general form of the quantum Cramer-Rao inequality 
in the simple setting of finite dimensional quantum mechanics. For 9 G {—e,e) C M a 
statistical operator p{6) is given and the aim is to estimate the value of the parameter 
close to 0. Formally p{9) is an nxn positive semidefinite matrix of trace 1 which describes 
a mixed state of a quantum mechanical system and we assume that p{9) is smooth (in 
9). Assume that an estimation is performed by the measurement of a self-adjoint matrix 
A playing the role of an observable. A is called locally unbiased estimator if 

^Trp(0)A =1. (19) 

09 9=0 

This condition holds if A is an unbiased estimator for 9, that is 

Ttp{9)A = 9 i9ei-e,e)). (20) 

To require this equality for all values of the parameter is a serious restriction on the 
observable A and we prefer to use the weaker condition 0191) . 

Let ipo[K,L] be an inner product (or quadratic cost function) on the linear space of 
self-adjoint matrices. When p{9) is smooth in 9, as already was assumed above, then 

^Ttp{9)b\^^^ = MB,L] (21) 

with some L = L*. From (1191) and (12 ip . we have v^ol^^ L] = 1 and the Schwarz inequality 
yields 

MA,A]>-^. (22) 

This is the celebrated inequality of Cramer-Rao type for the locally unbiased esti- 
mator. 

The right-hand-side of (12^ is independent of the estimator and provides a lower 
bound for the quadratic cost. The denominator ipQ[L,L] appears to be in the role of 
Fisher information here. We call it quantum Fisher information with respect to the 
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cost function <^o[' , •]■ This quantity depends on the tangent of the curve p{0). If the 
densities p{6) and the estimator A commute, then 



^idp{9) 



and ML,L] = Trp, j = Trpo ^Po ^) ■ 



~l9~ <^o[^,^] = Trpo ( — ^ ) =Trpo(Po —J^] ■ (23) 

We want to conclude from the above argument that whatever Fisher information 
and generahzed variance are in the quantum mechanical setting, they are very strongly 
related. In an earlier work [361 E21 we used a monotonicity condition to make a limita- 
tion on the class of Riemannian metrics on the state space of a quantum system. The 
monotone metrics are called Fisher information quantities in this paper. 

Since the sufficient and necessary condition for the equality in the Schwarz inequality 
is well-known, we are able to analyze the case of equality in ( l22i) . The condition for 
equality is 

A = \L 

for some constant A G M. Therefore the necessary and sufficient condition for equality 
in ([22]) is 

Po := ^p{e) = X-'UA) . (24) 

Therefore there exists a unique locally unbiased estimator A = AJo^(po), where the 
number A is chosen such a way that the condition (1191) should be satisfied. 

Example 5 Let 

Pie) ■.= p + eB, 

where p is a positive definite density and i? is a self-adjoint traceless operator. A is 
locally unbiased when Tt AB = 1. In particular, 

A ^ 



Tr52 



is a locally unbiased estimator and in the Cramer-Rao inequality (l22l) the equality holds 
when (po[X, Y] = TrXY, that is, Jo is the identity. 

If Tr pB = holds in addition, then the estimator is unbiased. □ 



3.2 Coarse-graining and monotonicity 

In the simple setting in which the state is described by a density matrix, a coarse- 
graining is an affine mapping sending density matrices into density matrices. Such a 
mapping extends to all matrices and provides a positivity and trace preserving linear 
transformation. A common example of coarse-graining sends the density matrix pi2 of a 
composite system 1 + 2 into the (reduced) density matrix pi of component 1. There are 
several reasons to assume completely positivity about a coarse graining and we do so. 
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Assume that p{6) is a smooth curve of density matrices with tangent A := p at p. 
The quantum Fisher information Fp{A) is an information quantity associated with the 
pair {p,A), it appeared in the Cramer- Rao inequahty above and the classical Fisher 
information gives a bound for the variance of a locally unbiased estimator. Let now 
/? be a coarse-graining. Then [3{p{6)) is another curve in the state space. Due to the 
linearity of /3, the tangent at /3(po) is I3{A). As it is usual in statistics, information 
cannot be gained by coarse graining, therefore we expect that the Fisher information at 
the density matrix po in the direction A must be larger than the Fisher information at 
/3(po) in the direction j3{A). This is the monotonicity property of the Fisher information 
under coarse-graining: 

Fp(^)>F/3(p)(/3(A)) (25) 

Although we do not want to have a concrete formula for the quantum Fisher information, 
we require that this monotonicity condition must hold. Another requirement is that 
Fp{A) should be quadratic in A, in other words there exists a non-degenerate real bilinear 
form 7p(A, B) on the self-adjoint matrices such that 

F,(A)=7p(A^). (26) 

The requirements ( l25l) and ( l26l) are strong enough to obtain a reasonable but still wide 
class of possible quantum Fisher informations. 

We may assume that 

7p(A,5) = TrAJ;i(5*). (27) 

for an operator Jp acting on matrices. (This formula expresses the inner product 7/5 
by means of the Hilbert-Schmidt inner product and the positive linear operator Jp.) In 
terms of the operator Jp the monotonicity condition reads as 

/?*J^i)/? < J;' (28) 

for every coarse graining (3. {j3* stand for the adjoint of (3 with respect to the Hilbert- 
Schmidt product. Recall that 13 is completely positive and trace preserving if and only if 
(3* is completely positive and unital.) On the other hand the latter condition is equivalent 
to 

f3Jp(3* < J/3(p) . (29) 

We proved the following theorem in [SUj . 

Theorem 5 If for every invertible density matrix p G M„(C) a positive definite sesquilin- 
ear form 7p : M„(C) x M„(C) — >■ C zs given such that 

(1) the monotonicity 

7p(A^)>7/.(p)(/5(A),/?(A)) 
holds for all completely positive coarse grainings (3 : M„(C) — » Mm(C), 

(2) 7p(A, A) is continuous in p for every fixed A, 
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(3) 7p(A^)=7p(^*,^*), 

(4) 7p(y4,yl) = Trp^^A^ if A is self-adjoint and Ap = pA, 

then there exists a unique standard operator monotone function f : M"*" M such that 

lliAA) = TtAS-\A) and J, = /(LpM;i)Mf , 

where the linear transformations hp and Mp acting on matrices are the left and right 
multiplications, that is 

Lp(X) = pX and Rp(X) = Xp . 

The above 7p(A, A) is formally a quasi-entropy, Sfj'j: (p, p), however this form is not 
suitable to show the monotonicity. Assume that p = Diag (Ai, A2, . . . , A„). Then 

It is clear from this formula that the Fisher information is afiine in the function 
1//. Therefore, Hansen's canonical representation of the reciprocal of a standard 
operator monotone function can be used |19j . 



Theorem 6 // / : —>■ be a standard operator monotone function, then 

1 + A / 1 1 



f{t) Jo 2 V^ + ^ 1 + ^^ 
where p is a probability measure on [0, 1]. 



dp{X), 



The theorem implies that the set {1// :/ is standard operator monotone} is convex 
and gives the extremal points 

^a(x) := ^ (-^ + (0 < A < 1). (31) 



2 \t + X 1 + tX 
One can compute directly that 

d , , [1 - X^){x + l){x - If 



9A^"' ' 2(x + A)2(l +xA)2 ■ 

Hence gx is decreasing in the parameter A. For A = we have the largest function 
go(t) = (t + l)/(2t) and for A = 1 the smallest is gi(t) = 2/(t + 1). (Note that this was 
also obtained in the setting of positive operator means |26j, harmonic and arithmetic 
means.) 
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Via the operator Jp, each monotone Fisher information determines a quantity 

iPp[A,A] :=TrAJ,(A) (32) 

which is a quadratic cost functional. According to ( !29l) (or Theorem Hj) this possesses 
the monotonicity property 

^,r(A),/3*(A)]<^^(,)[A,A]. (33) 

Since ( l28ll and ( !29l) are equivalent we observe a one-to-one correspondence between 
monotone Fisher informations and monotone quadratic cost functions. 

Theorem 7 If for every invertible density matrix p G M„(C) a positive definite sesquilin- 
ear form ipp : M„(C) x M„(C) ^ C g'Zfen swc/i that 

(1) the monotonicity holds for all completely positive coarse grainings (3 : M„(C) 

(2) ipp[A, A] is continuous in p for every fixed A, 

(3) ^p[A,A] = ^p[A\A% 

(4) (pp[A,A] = Tt pA"^ if A is self-adjoint and Ap = pA, 

then there exists a unique standard operator monotone function f : M such that 

^j,[A,A] = TTA3piA) 
with the operator Jp defined in Theorem 

Any such cost function has the property ifp[A, B] = Tr pA*B when p commutes with 
A and B. The examples below show that it is not so generally. 

Example 6 Among the standard operator monotone functions, fa{t) = (1 + t)/2 is 
maximal. This leads to the fact that among all monotone quantum Fisher informations 
there is a smallest one which corresponds to the function fa{t). In this case 

Ff'^iA) = TtAL = Tr pL^, where pL + Lp = 2A. (34) 

For the purpose of a quantum Cramer-Rao inequality the minimal quantity seems to be 
the best, since the inverse gives the largest lower bound. In fact, the matrix L has been 
used for a long time under the name of symmetric logarithmic derivative, see [21] 
and [21]. In this example the quadratic cost function is 

y^p[A, B] = |Tr p{AB + BA) (35) 

and we have 

J^(5) = l{pB + Bp) and J;i(A) = ^-tpji j^^-tpji ^gg) 
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for the operator J of the previous section. 

To see the second formula of ([HSD, set A{t) := e'^P''^ Ae-^f/"^ . Then 

d_ 

and 



-A{t) = -l{pA{t)-A{t)p) 



Hence 



POO 

/ lipAit) + Ait)p)dt=[-Ait)]^ = A. 
Jo 

^p{J^ A{t)dt^=A. 



Let T = T* and po be a density matrix. Then D{6) := exp{6T/2)pQ exp{6T/2) satisfies 
the differential equation 

^D{e) = Ioie)T (37) 

and 

D(0) 



is a kind of exponential family. 

If Tr PqT = and Tr PqT^ = 1, then 

d 



= 1 



and T is a locally unbiased estimator (of the parameter 6* at = 0). Since 

d 



89' 



p{e) =Jo(T), 



61=0 



we have equality in the Cramer- Rao inequality, see ( l24l) . □ 
Example 7 The function 

f,W-P(l-l3) ^^,!llZ_^^ (39) 

is operator monotone if < < 1. 

When A = i[p, B] is orthogonal to the commutator of the foot-point p in the tangent 
space, we have 

F^iA) = ^^^/_^^ Tr([pM?][/-M?]). (40) 

Apart from a constant factor this expression is the skew information proposed by Wigner 
and Yanase some time ago (PT]). In the limiting cases /5 — or 1 we have 

foix) = ^ 
logx 
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and the corresponding Fisher information 

POO 

7p(A5):=/ TrA{p + t)-^B{p + t)-Ut (41) 
Jo 

is named after Kubo, Mori, Bogohubov etc. The Kubo-Mori inner product plays a role 
in quantum statistical mechanics (see [I2], for example). In this case 

fOQ /» 1 

r\B)= / {p + t)-^B{p + t)-Ut and J(A) = / p'Ap^^'dt. (42) 
Jo Jo 

Therefore the corresponding quadratic cost functional is 

ipp[A,B]= [ TiAp'Bp^-'dt. (43) 
Jo 

Let 

-P'f;;^' . (44) 

Ir exp(ii + BI ) 

where p = e^. Assume that Tr e^T = 0. The Frechet derivative of is Tr QtHrp^{i-t)H 
Hence A is locally unbiased if 

>i 

TTp^Tp^-^Adt= 1. 



This holds if 

A ^ 



Tr p^Tp^-^T dt 

In the Cramer-Rao inequality (1^ the equality holds when Jo{K) = D^KD^^* dt. 
Note that (H4l) is again an exponential family, the differential equation for 

D{e) = exp{H + eT) 

has the form (1H7|1 with 



SDie){K) = [ D{dYKD{dY-Ut. 
Jo 



□ 



Problem 2 It would be interesting to find more exponential families. This means solu- 
tion of the differential equation 

^D{e) = hie)T, D{0) = po. 

If the self-adjoint T and the positive p commute, then the solution is D{6) = exp{9T)pQ. 
A concrete example is 

Ou 
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3.3 Manifolds of density matrices 

Let Ai := {p{d) '■ 6 G G} be a smooth m- dimensional manifold of invertible density 
matrices. When a quadratic cost function ipo is fixed, the corresponding Fisher infor- 
mation is a Riemannian metric on the manifold. This gives a possibility for geometric 
interpretation of statistical statements [HE]- 

Fisher information appears not only as a Riemannian metric but as an information 
matrix as well. The quantum score operators (or logarithmic derivatives) are defined 
as 

HO) ■■= S;ie)ideAd)) (l<^<m) (45) 

and 

Jg(^) := TrL,(^^)JJ,(,)(L,(^)) (1 < t,j < m) (46) 
is the quantum Fisher information matrix. 

The next result is the monotonicity of Fisher information matrix. 

Theorem 8 | |38] Let (3 he a coarse-graining sending density matrices on the Hilbert 
space Hi into those acting on the Hilbert space H2 and let M. := {p{0) : 9 G G} be 
a smooth m- dimensional manifold of invertible density matrices on Tii. For the Fisher 
information matrix I^^{6) of Ai and for Fisher information matrix P^{9) of I3{M.) : = 
{(3{p{9)) : 9 G G} we have the monotonicity relation 

I^^{9) < I^^{9). (47) 

Assume that Fj are positive operators acting on a Hilbert space Tii on which the 
family Ai := {p{9) : 9 G G} is given. When J2]=i^j — these operators determine a 
measurement. For any p{9) the formula 

(3{p{9)) := Diag(Trp(e)Fi, . . . ,Trp(^)F„) 

gives a diagonal density matrix. Since this family is commutative, all quantum Fisher 
informations coincide with the classical fl^Hl) and the classical Fisher information stand 
on the left-hand-side of ( |471) . The right-hand-side can be arbitrary quantum quantity 
but it is minimal if it based on the symmetric logarithmic derivative, see Example El 
This particular case of the Theorem is in the paper [5]. 

Assume that a manifold Ai := {p{9) : 9 G G} of density matrices is given together 
a statistically relevant Riemannian metric 7. Given two points on the manifold their 
geodesic distance is interpreted as the statistical distinguish-ability of the two density 
matrices in some statistical procedure. 

Let po G be a point on our statistical manifold. The geodesic ball 

Be{Po) ■.= {peM: d{po,p) < e} 
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contains all density matrices which can be distinguished by an effort smaller than e from 
the fixed density po- The size of the inference region B^{pq) measures the statistical 
uncertainty at the density po- Following Jeffrey's rule the size is the volume measure 
determined by the statistical (or information) metric. More precisely, it is better to 
consider the asymptotics of the volume of -Be(po) as £ ^ 0. It is known in differential 
geometry that 

Vol{B,{po)) = - ^, . Seal (po)£™+' + 0(5'"+^), (48) 

where m is the dimension of our manifold, Cm is a constant (equals to the volume of the 
unit ball in the Euclidean m-space) and Seal means the scalar curvature, see fl3 [ 3.98 
Theorem] . In this way, the scalar curvature of a statistically relevant Riemannian metric 
might be interpreted as the average statistical uncertainty of the density matrix (in 
the given statistical manifold). This interpretation becomes particularly interesting for 
the full state space endowed by the Kubo-Mori inner product as a statistically relevant 
Riemannian metric. 

The Kubo-Mori (or Bogoliubov) inner product is given by 

jp{A,B) = TT{dAp){dBlogp), (49) 

or 0411) in the affine parametrization. On the basis of numerical evidences it was con- 
jectured in that the scalar curvature which is a statistical uncertainty is monotone 
in the following sense. For any coarse graining a the scalar curvature at a density p 
is smaller than at a(p). The average statistical uncertainty is increasing under coarse 
graining. Up to now this conjecture has not been proven mathematically. Another form 
of the conjecture is the statement that along a curve of Gibbs states 

the scalar curvature changes monotonly with the inverse temperature (3 > 0, that is, the 
scalar curvature is monotone decreasing function of (3. (Some partial results are 
mS.) 

Let Ai be the manifold of all invertible n x n density matrices. If we use the affine 
parametrization, then the tangent space Tp consists of the traceless self-adjoint matrices 
and has ab orthogonal decomposition 

Tp = {i[p, B]:Be } ®{A = A*:TtA = 0, Ap = pA}. (50) 

We denote the two subspaces by and T^, respectively. If A2 G T^, then 

F{A{p/p))iA,p^'/') = A,p^'/' 

implies 

qCov;(A, A2) = TipAlA^ - {TrpADiTrpA^), lliA^A^) = Tip-'AIA^ 
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independently of the function /. Moreover, if Ai G T^, then 



7p^(Ai,A2) = qCov^(Ai,A2) = 0. 

Therefore, the decomposition flSUp is orthogonal with respect to any Fisher information 
and any quadratic cost functional. Moreover, the effect of the function / and the really 
quantum situation are provided by the components from T^. 

3.4 Skew information 

Let / be a standard function and X = X* G M„. The quantity 

was called skew information in ^19j in this general setting. The skew information is 
nothing else but the Fisher information restricted to T^, but it is parametrized by the 
commutator. 

If p = Diag (Ai, . . . , A„) is diagonal, then 

This implies that the identity 

/(0)7;(i[p,X],i[p,X]) = 2Cov,(X,X) - 2qCov{(X,X) (51) 
holds if Tr pX = and 

/»:=^((^ + l)-(^-l)'y|y)- (52) 

The following result was obtained in [T3]. 
Theorem 9 // / : — > M zs a standard function, then f is standard as well. 



The original proof is not easy, even matrix convexity of functions of two variables is 
used. Here we sketch a rather elementary proof based on the fact that 1// i— > / is linear 
and on the canonical decomposition in Theorem [6l 

Lemma 1 Let < A < and fx '■ be a function such that 

1 ^ 1^ / 1 1 \ ^ , . 

/a(x) 2 U + A ^ I+xaJ 

Then the function f : M defined in / T^) is an operator monotone standard function. 
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The proof of the lemma is elementary. From the lemma and Theorem [6], Theorem [9] 
follows straightforwardly [ID]. 



The skew information is the Hessian of a quasi-entropy: 

Theorem 10 Assume that X = X* e M„ and Ti pX = 0. If f is a standard function 
such that /(O) ^ 0, then 

Spip + ti[p, X],p + si[p, X]) = f{Ohl{i[p, X], i[p, X]) 



dtds 

for the standard function F = f . 



t=s=0 



The proof is based on the formula 

d_ 
di 



^h{p + ti[p, X]) =i[h{p),X] 



i=0 



see EOl. 



Example 8 We compute the Hessian of the relative entropy of degree a in an exponen- 
tial parametrization: 



dtds 

where 



t=s=0 



Jo 



^ u if < M < a, 

ga{u) = — la if — a, (53) 

for a < 1/2 and for a > 1/2 = Qi-a- 
Since 

5«(e^+*^||e^+^^) = — -^TTexpaiH + sB)exp{l-a){H + tA), 



dtds a{l — a) dtds 

we calculate as follows: 

1 ^ d 

-Tt — exp a{H + sB) — exp{l - a){H + tA) 



a{l — a) ds dt 
= Tr / / exp{xaH)B exp{l — x)aH exp{y{l — a)H)Aexp{l — y){l — a)H dxdy 



^0 
1 rl 



Tr ^ ^ exp(^{xa + {l-y){l-a))H^Bexp(^{{l-x)a + y{l-a))H^Adxdy 
Try J exp (^(xa — y + ya — a + 1)H^B exp — xa + y — ya + a)H^Adxdy 



7' 

Jo 



F{—xa + y — ya + a) dxdy 
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for the functional 
We continue 



1 /.I 

F{—xa + y — ya + a) dxdy 

1 rl 



^0 



= / / F(—xa + y{l — a)+a)dxdy 
Jo Jo 

— / / F{—xa + z + a) dzdx 

Jx=0 1 ~ ^2=0 
-j^ rO -j^ rl—a 

— — / F{z — w) dzdw 

a Jyj=-a 1 - « Jz=0 



1 

F{u)ga{u) du, 

where Qa is as above. □ 



& 



|2 



dtds 
We know that 



5„(e^+*^+^^| |e^) = ^^^^_^^ Tr exp(l + + sB)^ exp(ai/) , 



a- 



.2 



exp(H + tA + sB)= = [ [ e(^-^)^5e('-")^/le"^ duds , 

t=s=o Jo Jo 



dtds 
therefore 

92 rl rs 



cxp(l - a) (H +tA+ sB) ^ (l-af [ [ e^^-'^^^-""^^ Be^''''^^^-"'^" Ae''^^-"'^^ duds , 

Jo Jo 



dtds 

therefore we obtain 



/' /'Tre[i-(^-")l(^-")^5e(^-")(i-")^yldM(is = [\l - x)Tt e^'-^^^^''^^'' Be^'^'^''^'' Adx 
Jo Jo Jo 

If Q! = 0, then we have the Kubo-Mori inner product. □ 



4 Von Neumann algebras 

Let Al be a von Neumann algebra. Assume that it is in standard form, it acts on a 
Hilbert space H, V G H is the positive cone and J :Ti. ^ H is the modular conjugation. 
Let (/9 and uj be normal states with representing vectors $ and fl in the positive cone. For 
the sake of simplicity, assume that ip and lu are faithful. This means that $ and fl are 
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cyclic and separating vectors. The closure of the unbounded operator y4$ A*Q has 
a polar decomposition JA{uj/(pY^'^ and A{laj/(p) is called relative modular operator. 
A$ is in the domain of A{u/(fY^'^ for every A ^ Ai. 

For A ^ Ai and / : M, the quasi-entropy 

SfiiuWip) := {A^, fiAiuj/ip))A^) (54) 
was introduced in [33], see also Chapter 7 in [31j. Of course, ([5l) is a particular case. 

Theorem 11 Assume that f : —>■ M. is an operator monotone function with /(O) > 
and a : Aio ^ Ai is a Schwarz mapping. Then 

Sf{ujoayoa)>S';^'^\uj\\ip) (55) 

holds for A G Aio O'^d for normal states u and ip of the von Neumann algebra A4. 

The relative entropies are jointly convex in this setting similarly to the finite dimen- 
sional case. Now we shall concentrate on the generalized variance. 



4.1 Generalized covariance 

To deal with generalized covariance, we assume that / : ^ R is a standard op- 
erator monotone (increasing) function. The natural extension of the covariance (from 
probability theory) is 



qCov£(A, B) = {^f{A{uj/uj))An, ^ f{A{uj /uj))B^l) - uj{A)uj{B), (56) 

where A(ci;/ti;) is actually the modular operator. Although A{uj/uj) is unbounded, the 
definition works. For the function /, the inequality 

< f[x) < — — 



a; + 1 



holds. Therefore AVt is in the domain of a/ f {A{uj / uj)) . 

For a standard function / : R+ R+ and for a normal unital Schwarz mapping 
/? : A/" — > the inequality 

qCov£(/5(X), /?(X)) < qCovi^(X, X) (X G AT) (57) 

is a particular case of Theorem [TT] and it is the monotonicity of the generalized covariance 
under coarse-graining. The common symmetrized covariance 



Cov^{A, B) := \uj{A*B + BA*) - uo{A)uj{B) 
is recovered by the particular case f{t) = (1 + 1)/2. 

Since 

qCov£(A, B) =ii{A- u:{A)I, B - uj{B)I), 

it is enough to consider these sesquihnear forms on the subspace := {A E M. : oj{A) 
0}. 
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4.2 The Cramer-Rao Inequality 

Let {ug : 6 G G} be a smooth m-dimensional manifold in the set of normal states of 
the von Neumann algebra Ai and assume that a collection A = {Ai, . . . ,Am) of self- 
adjoint operators is used to estimate the true value of 9. The subspace spanned by 
Ai, A2, . . . , Am is denoted by V. 

Given a standard matrix monotone function /, we have the corresponding cost func- 
tion 

ife[AB] = qCov';;<'iA,B) 

for every 6 and the cost matrix of the estimator A is a positive semidefinite matrix, 
defined by 

f9[A]ij = fe[Ai, Aj]. 

The bias of the estimator is 

b{e) = {h{9),b2{9),...,bU0)) 

:= {uJeiAi - 9J),uJeiA2 - ^2/), . . . , ^^(A^ - 9^1))- 

For an unbiased estimator we have b{9) = 0. From the bias vector we form a bias 
matrix 

B^j{9):=deM(^)- 
For a locally unbiased estimator at 9o, we have B{9o) = 0. 

The relation 

determines the logarithmic derivatives Li{9). The Fisher information matrix is 

J,,{9) ■.= ^eM9),L,{9)]. 

Theorem 12 Let A = {Ai, . . . , Am) be an estimator of 9. Then for the above defined 
quantities the inequality 

MA]>{I + B{9))J{9)-\l + B{9r) 
holds in the sense of the order on positive semidefinite matrices. 

Concerning the proof we refer to |38j . 



4.3 Uncertainty relation 

In the von Neumann algebra setting the skew information (as a sesquilinear form) can 
be defined as 

iliX, Y) := Cov^(X, Y) - qCov£(X, Y) (58) 
if c.(X) = uj{Y) = 0. (Then //(X) = 4^(X,X).) 
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Lemma 2 Let K he a Hilbert space with inner product {{■ , ■)) and let {■ , ■) be a 

sesquilinear form on K, such that 

< (/, /) < ((/, /)) 

for every vector / G /C. Then 

[ /.) < [ iifr. /.)) ]m=1 (59) 

holds for every /i, /2, • • • , /m e /C. 

Proof: Consider the Gram matrices G := [ ((/i, /j)) ]™ =i and H := [ (/«, /j) ]i^=i, 
which are symmetric and positive semidefinite. For every ai, . . . , G K we get 

m m m m m 

i,j=l i=l i=l i=l 4=1 

by assumption. This says that G — H is positive semidefinite, hence it is clear that 
G>H. □ 

Theorem 13 Assume that f,g : M"*" M are standard functions and u is a faithful 
normal state on a von Neumann algebra Ai. Let Ai,A2,...,Am & A4 be self-adjoint 
operators such that uj{Ai) = uj{A2) = . . . = uj{Am) = 0. Then the determinant inequality 

det{[qCoyUA,A,)]^^^,) > det[[2g{0)ll{A, A,)]^^^) (60) 

holds. 

Proof: Let E{-) be the spectral measure of A{uj,u!). Then for m = 1 the inequality 

is 

giX) df^iX) < giO) (^j rf/i(A) - j /(A) d^^{\) 

where dn^X) = d{AQ, E{X)AQ) . Since the inequality 

fix)gix)>fiO)gmx-lf (61) 
holds for standard functions (TU], we have 

9W>9{0) - /(0)/(A)l 



and this implies the integral inequality. 



Consider the finite dimensional subspace Af generated by the operators Ai, A2, . . . , A^- 
On TV we have the inner products 

{{A,B)):=CoviiA,B) 
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and 

{A,B) := 2g{0)ll^{A,B). 
Since {A, A) < {{A, A)), the determinant inequality holds, see Lemma [21 □ 

This theorem is interpreted as quantum uncertainty principle [HI UHl [lH [25]. In the 
earlier works the function g from the left-hand-side was {x + l)/2 and the proofs were 
more complicated. The general g appeared in 
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